Goto

Collaborating Authors

 alphago zero


Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations

Shaheen, Abdelrhman, Badr, Anas, Abohendy, Ali, Alsaadawy, Hatem, Alsayad, Nadine

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) has been widely used in many applications, particularly in gaming, which serves as an excellent training ground for AI models. Google DeepMind has pioneered innovations in this field, employing reinforcement learning algorithms, including model-based, model-free, and deep Q-network approaches, to create advanced AI models such as AlphaGo, AlphaGo Zero, and MuZero. AlphaGo, the initial model, integrates supervised learning and reinforcement learning to master the game of Go, surpassing professional human players. AlphaGo Zero refines this approach by eliminating reliance on human gameplay data, instead utilizing self-play for enhanced learning efficiency. MuZero further extends these advancements by learning the underlying dynamics of game environments without explicit knowledge of the rules, achieving adaptability across various games, including complex Atari games. This paper reviews the significance of reinforcement learning applications in Atari and strategy-based games, analyzing these three models, their key innovations, training processes, challenges encountered, and improvements made. Additionally, we discuss advancements in the field of gaming, including MiniZero and multi-agent models, highlighting future directions and emerging AI models from Google DeepMind.


Inference Scaling Reshapes AI Governance

Ord, Toby

arXiv.org Artificial Intelligence

The shift from scaling up the pre - training compute of AI systems to scaling up the ir inference compute may have profound effects on AI governance. The nature of these effects depends crucially on whether this new inference compute will primarily be used during external deployment or as part of a more complex training programme within the lab. R apid scaling of inference - at - deployment would: lower the importance of open - weight models (and of securing the weights of closed models), reduce the impact of the first human - level models, change the business model for frontier AI, reduce the need for power - intense data centres, and derail the current paradigm of AI governance via training compute thresholds. R apid scaling of inference - during - training would have more ambiguous effects that range from a revitalisation of pre - training scaling to a form of recursive self - improvement via iterated distillation and amplification . The intense year - on - year scaling up of AI training runs has been one of the most dramatic and stable markers of the Large Language Model era . Indeed it had been widely taken to be a permanent fixture of the AI landscape and the basis of many approaches to AI governance. But recent reports from unnamed employees at the leading labs suggest that their attempts to scale up pre - training substantially beyond the size of GPT - 4 have led to only modest gains which are insufficient to justify continuing such scaling and perhaps even insufficient to warrant public deployment of th o se models ( Hu & Tong, 2024) . A possible reason is that they are running out of high - quality training data. While the scaling laws might still be operating (given sufficient compute and data, the models would keep improving), the ability to harness them through rapid scaling of pre - training may not.


Monte Carlo Tree Search (MCTS) in AlphaGo Zero

#artificialintelligence

In a Go game, AlphaGo Zero uses MC Tree Search to build a local policy to sample the next move. MCTS searches for possible moves and records the results in a search tree. As more searches are performed, the tree grows larger as well as its information. To make a move in Alpha-Go Zero, 1,600 searches will be computed. Then a local policy is constructed.


Top Real World Applications of Reinforcement Learning in 2022

#artificialintelligence

Reinforcement Learning is a subfield of Machine Learning in which an agent explores an environment to learn how to perform specific tasks by taking actions with a good outcome and avoiding those with a bad one. A reinforcement learning model will learn from its experiences and will identify which actions lead to the best rewards. In reinforcement learning, the agent takes action based on the state of the environment, and the environment will return the reward and the next state. The agent employs a trial and error method to learn. It initially takes random actions and identifies which actions lead to long-term rewards over time.


Leela Zero Score: a Study of a Score-based AlphaGo Zero

Pasqualini, Luca, Parton, Maurizio, Morandin, Francesco, Amato, Gianluca, Gini, Rosa, Metta, Carlo

arXiv.org Artificial Intelligence

AlphaGo, AlphaGo Zero, and all of their derivatives can play with superhuman strength because they are able to predict the win-lose outcome with great accuracy. However, Go as a game is decided by a final score difference, and in final positions AlphaGo plays suboptimal moves: this is not surprising, since AlphaGo is completely unaware of the final score difference, all winning final positions being equivalent from the winrate perspective. This can be an issue, for instance when trying to learn the "best" move or to play with an initial handicap. Moreover, there is the theoretical quest of the "perfect game", that is, the minimax solution. Thus, a natural question arises: is it possible to train a successful Reinforcement Learning agent to predict score differences instead of winrates? No empirical or theoretical evidence can be found in the literature to support the folklore statement that "this does not work". In this paper we present Leela Zero Score, a software designed to support or disprove the "does not work" statement. Leela Zero Score is designed on the open-source solution known as Leela Zero, and is trained on a 9x9 board to predict score differences instead of winrates. We find that the training produces a rational player, and we analyze its style against a strong amateur human player, to find that it is prone to some mistakes when the outcome is close. We compare its strength against SAI, an AlphaGo Zero-like software working on the 9x9 board, and find that the training of Leela Zero Score has reached a premature convergence to a player weaker than SAI.


AI in Games: Techniques, Challenges and Opportunities

Yin, Qiyue, Yang, Jun, Ni, Wancheng, Liang, Bin, Huang, Kaiqi

arXiv.org Artificial Intelligence

With breakthrough of AlphaGo, AI in human-computer game has become a very hot topic attracting researchers all around the world, which usually serves as an effective standard for testing artificial intelligence. Various game AI systems (AIs) have been developed such as Libratus, OpenAI Five and AlphaStar, beating professional human players. In this paper, we survey recent successful game AIs, covering board game AIs, card game AIs, first-person shooting game AIs and real time strategy game AIs. Through this survey, we 1) compare the main difficulties among different kinds of games for the intelligent decision making field ; 2) illustrate the mainstream frameworks and techniques for developing professional level AIs; 3) raise the challenges or drawbacks in the current AIs for intelligent decision making; and 4) try to propose future trends in the games and intelligent decision making techniques. Finally, we hope this brief review can provide an introduction for beginners, inspire insights for researchers in the filed of AI in games.


Why AI Chess Champs Are Not Taking Over the World

#artificialintelligence

At one time, the AI that beat humans at chess calculated strategies by studying the outcomes of human moves. In October 2017, the DeepMind team published details of a new Go-playing system, AlphaGo Zero, that studied no human games at all. Instead, it started with the game's rules and played against itself. The first moves it made were completely random. After each game, it folded in new knowledge of what led to a win and what didn't.


A Quick Primer on Self-Play in Deep Reinforcement Learning

#artificialintelligence

"Train tirelessly to defeat the greatest enemy, yourself, and to discover the greatest master, yourself" DeepMind has created AI that will crush any human player in Go, Chess, Shogi, and Starcraft 2. OpenAI has made similar strides in complex strategy games, notably in Dota 2. The agents in these games all achieved mastery using deep reinforcement learning. Yet, this is only part of the story. What was the magic sauce that sent these systems' playing ability out of the atmosphere? A simple framework called self-play, where your opponent is yourself. Self-play is a framework where an agent learns to play a game by playing against itself.


Artificial Intelligence

#artificialintelligence

Learn to write programs using the foundational AI algorithms powering everything from NASA's Mars Rover to DeepMind's AlphaGo Zero. Learn to write AI programs using the algorithms powering everything from NASA's Mars Rover to DeepMind's AlphaGo Zero.


Space Force scientist says it's 'imperative' military uses human augmentation by employing AI agents

Daily Mail - Science & tech

Combining humans with machines to create superhuman intelligence may soon no longer be the plot of science-fiction films, as the US Space Force's chief scientist say it will happen in'the coming decade.' Dr. Joel Mozer, speaking at an event at the Airforce Research Laboratory Wednesday, announced we are entering the age of'human augmentation,' which is crucial to the US's national defense in order to not'fall behind our strategic competitors.' However, his proposal does not turn humans into cyborgs, but employs'AI agents' to assist with strategic military planning. Mozer highlights the abilities seen in developed by a Google subsidiary, AlphaGo Zero, which was able to train itself to play the game of Go at a master level in just a few weeks. Mozer suggests the extortionary capabilities can lead to superhuman capabilities, by means of combining human ingenuity with the power, speed and efficiency of machines.